A High Quality Partial Parser for Annotating German Text Corpora
نویسنده
چکیده
In this paper, a two-stage partial parser for untagged German sentences is presented. In the first stage, the sentence is segmented into better parsable units according to the Topological Field Model. In the second stage, minimal phrases of NPs, DPs and PPs as well as nominal multiword units are identified in each of the recognized fields. In this paper, we discuss the results of the second stage. We evaluated 500 parsed sentences of a newspaper corpus. The achieved recall and precision rates are better than the ones of comparable systems as reported in literature so far.
منابع مشابه
Annotating Syllable Corpora with Linguistic Data Categories in XML
The usefulness of high quality annotated corpora as a development aid in computational linguistic applications is now well understood. Therefore it is necessary to have systematic, easily understandable and effective means for annotating corpora at many levels of linguistic description using. This paper presents a three step methodology for annotating speech corpora using linguistic data catego...
متن کاملCollaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain―some MANTRAs
The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages—an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch. We here present experimental results for automatically annotating parallel corp...
متن کاملAPOLN: A Partial Parser Of Unrestricted Text
In this paper, we present APOLN (Analizador Parcial de Oraciones en Lenguaje Natural): a partial parser of unrestricted natural language sentences based on finite-state techniques. Partial parsing has been used in several applications: syntactic parsing of unrestricted texts, data extraction systems, machine translation, solving the attachment ambiguity, speech recognition systems, text summari...
متن کاملImproving Dependency Parsing with Interlinear Glossed Text and Syntactic Projection
Producing annotated corpora for resource-poor languages can be prohibitively expensive, while obtaining parallel, unannotated corpora may be more easily achieved. We propose a method of augmenting a discriminative dependency parser using syntactic projection information. This modification will allow the parser to take advantage of unannotated parallel corpora where high-quality automatic annota...
متن کاملAutomatic Selection of High Quality Parses Created By a Fully Unsupervised Parser
The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsup...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004